40 research outputs found
Speeding up Convolutional Neural Networks with Low Rank Expansions
The focus of this paper is speeding up the evaluation of convolutional neural
networks. While delivering impressive results across a range of computer vision
and machine learning tasks, these networks are computationally demanding,
limiting their deployability. Convolutional layers generally consume the bulk
of the processing time, and so in this work we present two simple schemes for
drastically speeding up these layers. This is achieved by exploiting
cross-channel or filter redundancy to construct a low rank basis of filters
that are rank-1 in the spatial domain. Our methods are architecture agnostic,
and can be easily applied to existing CPU and GPU convolutional frameworks for
tuneable speedup performance. We demonstrate this with a real world network
designed for scene text character recognition, showing a possible 2.5x speedup
with no loss in accuracy, and 4.5x speedup with less than 1% drop in accuracy,
still achieving state-of-the-art on standard benchmarks
Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition
In this work we present a framework for the recognition of natural scene
text. Our framework does not require any human-labelled data, and performs word
recognition on the whole image holistically, departing from the character based
recognition systems of the past. The deep neural network models at the centre
of this framework are trained solely on data produced by a synthetic text
generation engine -- synthetic data that is highly realistic and sufficient to
replace real data, giving us infinite amounts of training data. This excess of
data exposes new possibilities for word recognition models, and here we
consider three models, each one "reading" words in a different way: via 90k-way
dictionary encoding, character sequence encoding, and bag-of-N-grams encoding.
In the scenarios of language based and completely unconstrained text
recognition we greatly improve upon state-of-the-art performance on standard
datasets, using our fast, simple machinery and requiring zero data-acquisition
costs
Deep Structured Output Learning for Unconstrained Text Recognition
We develop a representation suitable for the unconstrained recognition of
words in natural images: the general case of no fixed lexicon and unknown
length.
To this end we propose a convolutional neural network (CNN) based
architecture which incorporates a Conditional Random Field (CRF) graphical
model, taking the whole word image as a single input. The unaries of the CRF
are provided by a CNN that predicts characters at each position of the output,
while higher order terms are provided by another CNN that detects the presence
of N-grams. We show that this entire model (CRF, character predictor, N-gram
predictor) can be jointly optimised by back-propagating the structured output
loss, essentially requiring the system to perform multi-task learning, and
training uses purely synthetically generated data. The resulting model is a
more accurate system on standard real-world text recognition benchmarks than
character prediction alone, setting a benchmark for systems that have not been
trained on a particular lexicon. In addition, our model achieves
state-of-the-art accuracy in lexicon-constrained scenarios, without being
specifically modelled for constrained recognition. To test the generalisation
of our model, we also perform experiments with random alpha-numeric strings to
evaluate the method when no visual language model is applicable.Comment: arXiv admin note: text overlap with arXiv:1406.222
Open-ended Learning in Symmetric Zero-sum Games
Zero-sum games such as chess and poker are, abstractly, functions that
evaluate pairs of agents, for example labeling them `winner' and `loser'. If
the game is approximately transitive, then self-play generates sequences of
agents of increasing strength. However, nontransitive games, such as
rock-paper-scissors, can exhibit strategic cycles, and there is no longer a
clear objective -- we want agents to increase in strength, but against whom is
unclear. In this paper, we introduce a geometric framework for formulating
agent objectives in zero-sum games, in order to construct adaptive sequences of
objectives that yield open-ended learning. The framework allows us to reason
about population performance in nontransitive games, and enables the
development of a new algorithm (rectified Nash response, PSRO_rN) that uses
game-theoretic niching to construct diverse populations of effective agents,
producing a stronger set of agents than existing algorithms. We apply PSRO_rN
to two highly nontransitive resource allocation games and find that PSRO_rN
consistently outperforms the existing alternatives.Comment: ICML 2019, final versio